EvoMining 2.0: A customizable computational pipeline for evolutionary reconstructions during genome mining

Selem-Mojica Nelly, Cruz-Morales Pablo, Martínez-Guerrero Christian , …, and Barona-Gómez Francisco

Abstract

Microbial natural products has importance in human health and life. Due to the abundance of genomic and metagenomic data, new natural products research by genome mining is a growing field. Traditional genome mining approaches explored bacterial genomes localizing marks of previously knwon secondary metabolism enzymes organized on biosynthetic gene clusters (BGCs). Here we present EvoMining a downloadable visual genome mining tool that incorporates evolution theory into genome mining. On EvoMining databases are customizable, its based on enzyme expansions not on BGCs. The advantage of this method is that every expanded enzyme family is a candidate to explore recruitments, and all prokatyiotic genome, even the unexplored Archaea kingdom. On this study EvoMining was applied to several database such as Cyanobacteria, Actinobacteria, Pseudomonas and Archea studying expansions for enzyme families such as TauD and other enzymes recently recruited onto secondary metabolism. Finally the genomic plasticity of Streptomyces coelicolor known BGCs i explored generlizind applying the open/Close pangenome approach to a BGCs. This Evolutionary methods open the door to discover not previously knwon chemical compounds at private genome collections and prioritize them according to their genomic plasticity.

Introduction

Natural products are synthesized by biosynthetical gene clusters (BGCs) codified on the genome of a wide range of microorganisms. Enzymes that belong to a BGC can either be mainly restricted to secondary metabolism, or be a recent recruitment acting as accesory enzymes.
With the genomic era and 500,000 prokaryotic genomes available at NCBI, there has been a oom of development of specilized genome mining software. Traditional approaches are based on recognize marks of enzymes devoted to secondary metabolism (???), or domains (???) lattely Evolution (??? nadine).
On prokaryotic genomes enzyme families are expanded frequently either by duplication or by horizontal gene transfer and that this expansions are acting as evolutionary raw material being recruited into secondary metabolism to perform nobel chemical functionalities. A proof of concept of EvoMining idea was provided by the discovery of an arseno compound on Streptomyces coelicolor (Cruz-Morales et al. 2016), nevertheless.

Despite EvoMining analysis has recently being present on the natural products field (Blin et al. 2017,Alanjary et al. (2017),Ziemert, Alanjary, and Weber (2016),Miller, Chevrette, and Kwan (2017)) EvoMining software has not been released, on this work we free EvoMining as a downloadable stand alone tool implemented on a docker container. EvoMining is free and open to all users and there is no login requirement. Despite Actinobacteria are great natural product producers (???) other microrganisms can be explored.

Here we present the EvoMining expansions analysis using different genome-DB such as Actinobacteria, Cyanobacteria, Pseudomonas and Archaea. To enrich possibilities of central DB an example of what we called backward EvoMining was incorporated: BGCs from S coelicolor available at Mi-BIG were analyzed EvoMining backwards and all enzyme families expanded but not over represented were followed.

Finally to prioritize which clusters possess more metabolite variations, assuming a link between genomic and metabolite plasticity we introduce the idea of classifying the saturation of a pangenome as open/closed pangenome measuring BGCs as open / closed BGC.

table <- read.csv("Figura3MiBIG/CoelicolorMiBIG", row.names = 1,sep="\t")
kable(table,  caption = "Coelicolor\\label{tab:Coelicolor MiBig}",caption.short = "CoelicolorMiBig ")
Coelicolor
Full…partial Main.product Biosynthetic.class Organism X..Backward.EvoMining.Hits Open.closed
BGC0000038 Full coelimycin Polyketide Streptomyces coelicolor A3(2) NA NA
BGC0000194 Full actinorhodin Polyketide Streptomyces coelicolor A3(2) NA NA
GC0000315 Full calcium-dependent antibiotic NRP Streptomyces coelicolor A3(2) NA NA
BGC0000551 Full sapB RiPP Streptomyces coelicolor A3(2) NA NA
BGC0000595 Full SCO-2138 RiPP Streptomyces coelicolor A3(2) NA NA
BGC0000849 Full gamma-butyrolactone Other Streptomyces coelicolor A3(2) NA NA
BGC0000940 Full desferrioxamine B Other Streptomyces coelicolor A3(2) NA NA
BGC0000324 Partial coelibactin NRP Streptomyces coelicolor A3(2) NA NA
BGC0000325 Partial coelichelin NRP Streptomyces coelicolor A3(2) NA NA
BGC0000660 Partial albaflavenone Terpene Streptomyces coelicolor A3(2) NA NA
BGC0000663 Partial hopene Terpene Streptomyces coelicolor A3(2) NA NA
BGC0000910 Partial melanin Other Streptomyces coelicolor A3(2) NA NA
BGC0000914 Partial methylenomycin Other Streptomyces coelicolor A3(2) NA NA
BGC0001063 Partial undecylprodigiosin NRP / Polyketide Streptomyces coelicolor A3(2) NA NA
BGC0001181 Partial geosmin Terpene Streptomyces coelicolor A3(2) NA NA
## Results a nd Discussion

Figure 1 EvoMining pipe-line

EvoMining is a visual, evolutionary based genome mining tool with the milestone of prioritize non standard secondary metabolite pathways. The algorithm follows enzyme families from central pathways on their recruitment as components of natural products biosynthetic gene clusters (BGCs) within a genomic database.

Pipeline

Pipeline

EvoMining inputs are a (1) a custom genomic database (genomic-DB), (2) a central pathways database (central-DB) and (3) a natural product database (natural-DB) composed of genes that belongs to experimentally tested BGCs. These three databases are provided and can be modified, replaced and expanded by the user. In this work genomic-DB are collection of up to date genomes in RAST format from taxonomically related organisms such as Actinobacteria, Cyanobacteria, Pseudomonas and Archaea. Selection of this taxa obeys to the possibility of comparing well known NPs producing organisms such as Actinobacteria and Cyanobacteria in contrast with Archaea that has been poorly investigated. The central-DB contains nine central pathways from Actinobacteria previously curated (Barona-Gómez, Cruz-Morales, and Noda-García 2012), plus an update of seed metabolic enzymes identified after manual curation congruent with the central EvoMining paradigm. The natural-DB currently comprises all sequences that belongs to some BGCs from The Minimum Information about a Biosynthetic Gene cluster (MIBiG) (Medema et al. 2015).

As output EvoMining identifies on the genomic-DB those expanded families from the central-DB that has at least a recruited member onto the natural-DB, proceeding then to the reconstruction of the evolutionary history of the enzyme family. Given an enzyme from the central-DB, the product of EvoMining analysis is a color coded tree of the expanded enzyme family that provides information about the metabolic fate. Specifically, enzymes from central metabolism are differentiated from known Natural Products enzymes and those expansions with potential activity into secondary metabolism are emphasised as putative novel recruitments. Further analysis of these hits allows visualization of the genomic vicinity guiding to the discovery of novel BGCs. In addition to the updates associated to the workflow of EvoMining, the version to be released will include the possibility of defining the dynamics of the gene content of any given BGC to explore the chemical plasticity related to EvoMining hits. This allows to prioritize which clusters possess more metabolite variations, therefore unmasking biosynthetic darkmatter (Medema and Fischbach 2015, Blin et al. 2017).

EvoMining code and components (blast, muscle, FastTree, newick utilities, Gblocks,apache and SVG perl module) are wrapped on the docker container nselem/newevomining downloadable at the Docker hub. Code is available at at github: nselem/EvoMining and manual at https://github.com/nselem/EvoMining/wiki. EvoMining tool will allow researchers to examine their own genomes and their own enzyme families in the search of expansions involved on nobel secondary metabolism.

EvoMining will identify those expanded families of the central-DB within the genomic-DB that has at least a recruited member onto the natural-DB, proceeding then to the reconstruction of the evolutionary history of the enzyme family. Given an enzyme from the central-DB, the product of EvoMining analysis is an interactive color coded tree of the enzyme expanded family where best bidirectional hits (BBH) of central-DB are differentiated from Natural Products members and those expansions close to a Natural Product sequence that are not BBH with central-DB enzymes are emphasised as putative nobel recruitments into secondary metabolism.

Figure 2 Expansions on some databases

Archaea Cyanobacteria, and Actinobacteria based on central metabolism from actinobacteria
To acotate the search for enzymes of recent recruitment into natural products TauD
### Figure 3.1 Expansions on genomic dinamics
3.2(Bakward EvoMining)
Coelicolor clusters

tableExp <- read.csv("Figura3MiBIG/ExpansionBlast.data", row.names = 1,sep="\t")
kable(tableExp,  caption = "CoelicolorExpansions\\label{tab:Coelicolor Expansions}",caption.short = "CoelicolorExpansions")
CoelicolorExpansions
Copies OrganismPercentage Organismos Expansions ExpOverOrg ExpNum Function
actinorhodin|6|74_1|Scoe 636 0.2663212 514 122 1.237354 0.1918239 hydroxylacyl-CoA dehydrogenase
actinorhodin|6|75_1|Scoe 10884 0.8937824 1725 9159 6.309565 0.8415105 3-hydroxyacyl-CoA dehydrogenase (EC 1.1.1.35)
actinorhodin|6|76_1|Scoe 20061 0.9082902 1753 18308 11.443811 0.9126165 FIG01127617: hypothetical protein
actinorhodin|6|77_1|Scoe 196 0.0963731 186 10 1.053763 0.0510204 hypothetical protein
actinorhodin|6|78_1|Scoe 20077 0.9098446 1756 18321 11.433371 0.9125367 Bifunctional protein: zinc-containing alcohol dehydrogenase; quinone oxidoreductase ( NADPH:quinone reductase) (EC 1.1.1.-); Similar to arginate lyase
actinorhodin|6|79_1|Scoe 20002 0.9093264 1755 18247 11.397151 0.9122588 putative integral membrane protein
actinorhodin|6|80_1|Scoe 756 0.3015544 582 174 1.298969 0.2301587 hypothetical protein
actinorhodin|6|81_1|Scoe 125 0.0507772 98 27 1.275510 0.2160000 FIG01121294: hypothetical protein
actinorhodin|6|82_1|Scoe 3722 0.7388601 1426 2296 2.610098 0.6168726 FIG01124094: hypothetical protein
actinorhodin|6|83_1|Scoe 3473 0.6191710 1195 2278 2.906276 0.6559171 Acyl-CoA dehydrogenase
actinorhodin|6|84_1|Scoe 614 0.2487047 480 134 1.279167 0.2182410 FIG01134414: hypothetical protein
actinorhodin|6|85_1|Scoe 14217 0.8829016 1704 12513 8.343310 0.8801435 Transcriptional regulator, TetR family
actinorhodin|6|86_1|Scoe 20004 0.9098446 1756 18248 11.391799 0.9122176 FIG01129015: hypothetical protein
actinorhodin|6|87_1|Scoe 13480 0.9202073 1776 11704 7.590090 0.8682493 Hopanoid-associated RND transporter, HpnN
actinorhodin|6|88_1|Scoe 17061 0.8191710 1581 15480 10.791271 0.9073325 actinorhodin cluster activator protein
actinorhodin|6|89_1|Scoe 20000 0.9492228 1832 18168 10.917031 0.9084000 Short-chain dehydrogenase/reductase SDR
actinorhodin|6|90_1|Scoe 25875 0.9145078 1765 24110 14.660057 0.9317874 Polyketide beta-ketoacyl synthase WhiE-KS paralog
actinorhodin|6|91_1|Scoe 25118 0.9139896 1764 23354 14.239229 0.9297715 Polyketide chain length factor WhiE-CLF paralog
actinorhodin|6|92_1|Scoe 1728 0.5549223 1071 657 1.613445 0.3802083 Acyl carrier protein
actinorhodin|6|93_1|Scoe 1729 0.5569948 1075 654 1.608372 0.3782533 actinorhodin polyketide synthase bifunctional cyclase/dehydratase
actinorhodin|6|94_1|Scoe 3237 0.7668394 1480 1757 2.187162 0.5427865 putative polyketide cyclase
actinorhodin|6|95_1|Scoe 7433 0.8518135 1644 5789 4.521289 0.7788242 NADH-FMN oxidoreductase
albaflavenone|7|96_1|Scoe 3313 0.6005181 1159 2154 2.858499 0.6501660 FIG00456465: hypothetical protein
albaflavenone|7|97_1|Scoe 15775 0.8538860 1648 14127 9.572209 0.8955309 putative cytochrome P450
calcium-dependent_antibiotic|4|23_1|Scoe 2914 0.9217617 1779 1135 1.637999 0.3894990 2-keto-3-deoxy-D-arabino-heptulosonate-7-phosphate synthase II (EC 2.5.1.54)
calcium-dependent_antibiotic|4|24_1|Scoe 2198 0.9445596 1823 375 1.205705 0.1706096 Indole-3-glycerol phosphate synthase (EC 4.1.1.48)
calcium-dependent_antibiotic|4|25_1|Scoe 2139 0.9150259 1766 373 1.211212 0.1743806 Anthranilate phosphoribosyltransferase (EC 2.4.2.18)
calcium-dependent_antibiotic|4|26_1|Scoe 6890 0.9829016 1897 4993 3.632051 0.7246734 Anthranilate synthase, amidotransferase component (EC 4.1.3.27)
calcium-dependent_antibiotic|4|27_1|Scoe 7434 0.9626943 1858 5576 4.001076 0.7500673 Anthranilate synthase, aminase component (EC 4.1.3.27)
calcium-dependent_antibiotic|4|28_1|Scoe 149 0.0766839 148 1 1.006757 0.0067114 hypothetical protein
calcium-dependent_antibiotic|4|29_1|Scoe 12084 0.9787565 1889 10195 6.397036 0.8436776 Cation-transporting ATPase, E1-E2 family
calcium-dependent_antibiotic|4|30_1|Scoe 20684 0.8082902 1560 19124 13.258974 0.9245794 FIG01132787: hypothetical protein
calcium-dependent_antibiotic|4|31_1|Scoe 4960 0.7849741 1515 3445 3.273927 0.6945565 Polymyxin synthetase PmxB
calcium-dependent_antibiotic|4|32_1|Scoe 2216 0.5440415 1050 1166 2.110476 0.5261733 putative lipase (putative secreted protein)
calcium-dependent_antibiotic|4|33_1|Scoe 40 0.0165803 32 8 1.250000 0.2000000 putative secreted protein
calcium-dependent_antibiotic|4|34_1|Scoe 2082 0.8953368 1728 354 1.204861 0.1700288 Arogenate dehydrogenase (EC 1.3.1.43)
calcium-dependent_antibiotic|4|35_1|Scoe 877 0.3725389 719 158 1.219750 0.1801596 secreted protein
calcium-dependent_antibiotic|4|36_1|Scoe 4941 0.8139896 1571 3370 3.145131 0.6820482 Thiamin ABC transporter, transmembrane component
calcium-dependent_antibiotic|4|37_1|Scoe 20006 0.9772021 1886 18120 10.607635 0.9057283 putative ABC transporter ATP-binding protein
calcium-dependent_antibiotic|4|38_1|Scoe 20150 0.8663212 1672 18478 12.051435 0.9170223 two component sensor kinase
calcium-dependent_antibiotic|4|39_1|Scoe 20000 0.8808290 1700 18300 11.764706 0.9150000 DNA-binding response regulator, LuxR family
calcium-dependent_antibiotic|4|40_1|Scoe 8235 0.9487047 1831 6404 4.497542 0.7776563 putative aminotransferase
calcium-dependent_antibiotic|4|41_1|Scoe 2983 0.7601036 1467 1516 2.033402 0.5082132 (S)-2-hydroxy-acid oxidase (EC 1.1.3.15)
calcium-dependent_antibiotic|4|42_1|Scoe 3205 0.7953368 1535 1670 2.087948 0.5210608 4-hydroxyphenylpyruvate dioxygenase (EC 1.13.11.27)
calcium-dependent_antibiotic|4|43_1|Scoe 197580 0.8176166 1578 196002 125.209125 0.9920134 Siderophore biosynthesis non-ribosomal peptide synthetase modules @ Bacillibactin synthetase component F (EC 2.7.7.-)
calcium-dependent_antibiotic|4|44_1|Scoe 87847 0.8186528 1580 86267 55.599367 0.9820142 Siderophore biosynthesis non-ribosomal peptide synthetase modules @ Bacillibactin synthetase component F (EC 2.7.7.-)
calcium-dependent_antibiotic|4|45_1|Scoe 63839 0.8186528 1580 62259 40.404430 0.9752502 Siderophore biosynthesis non-ribosomal peptide synthetase modules @ Bacillibactin synthetase component F (EC 2.7.7.-)
calcium-dependent_antibiotic|4|46_1|Scoe 13271 0.8968912 1731 11540 7.666667 0.8695652 Beta-ketoadipate enol-lactone hydrolase (EC 3.1.1.24)
calcium-dependent_antibiotic|4|47_1|Scoe 1043 0.5124352 989 54 1.054601 0.0517737 phosphotransferase
calcium-dependent_antibiotic|4|48_1|Scoe 21541 0.9782383 1888 19653 11.409428 0.9123532 ABC transporter, NBP/MSD fusion protein
calcium-dependent_antibiotic|4|49_1|Scoe 1165 0.3901554 753 412 1.547145 0.3536481 putative oxygenase (putative secreted protein)
calcium-dependent_antibiotic|4|50_1|Scoe 953 0.4233161 817 136 1.166463 0.1427072 FIG01125970: hypothetical protein
calcium-dependent_antibiotic|4|51_1|Scoe 904 0.3968912 766 138 1.180157 0.1526549 FIG01122924: hypothetical protein
calcium-dependent_antibiotic|4|52_1|Scoe 959 0.4243523 819 140 1.170940 0.1459854 FIG01124815: hypothetical protein
calcium-dependent_antibiotic|4|53_1|Scoe 893 0.4046632 781 112 1.143406 0.1254199 FIG01127693: hypothetical protein
calcium-dependent_antibiotic|4|54_1|Scoe 1070 0.4398964 849 221 1.260306 0.2065421 putaive isomerase
calcium-dependent_antibiotic|4|55_1|Scoe 865 0.3549223 685 180 1.262774 0.2080925 FIG00557539: hypothetical protein
calcium-dependent_antibiotic|4|56_1|Scoe 2610 0.8554404 1651 959 1.580860 0.3674330 Inositol-1-phosphate synthase (EC 5.5.1.4)
calcium-dependent_antibiotic|4|57_1|Scoe 65 0.0316062 61 4 1.065574 0.0615385 secreted protein
calcium-dependent_antibiotic|4|58_1|Scoe 13881 0.8352332 1612 12269 8.611042 0.8838700 Salicylate hydroxylase (EC 1.14.13.1)
calcium-dependent_antibiotic|4|59_1|Scoe 6332 0.8777202 1694 4638 3.737899 0.7324700 3-oxoacyl-[acyl-carrier-protein] synthase, KASIII (EC 2.3.1.180)
calcium-dependent_antibiotic|4|60_1|Scoe 6263 0.8248705 1592 4671 3.934045 0.7458087 FIG01132699: hypothetical protein
calcium-dependent_antibiotic|4|61_1|Scoe 25611 0.9139896 1764 23847 14.518707 0.9311233 3-oxoacyl-[acyl-carrier-protein] synthase, KASII (EC 2.3.1.41)
calcium-dependent_antibiotic|4|62_1|Scoe 92 0.0461140 89 3 1.033708 0.0326087 Acyl carrier protein
coelibactin|13|162_1|Scoe 21980 0.9290155 1793 20187 12.258784 0.9184258 2,3-dihydroxybenzoate-AMP ligase (EC 2.7.7.58)
coelibactin|13|163_1|Scoe 64111 0.8238342 1590 62521 40.321384 0.9751993 iron aquisition yersiniabactin synthesis enzyme (Irp2)
coelibactin|13|164_1|Scoe 65475 0.8279793 1598 63877 40.973091 0.9755937 Siderophore biosynthesis non-ribosomal peptide synthetase modules
coelibactin|13|165_1|Scoe 475 0.2119171 409 66 1.161369 0.1389474 Putative reductoisomerase in siderophore biosynthesis gene cluster
coelibactin|13|166_1|Scoe 542 0.1979275 382 160 1.418848 0.2952030 Thiazolinyl imide reductase in siderophore biosynthesis gene cluster
coelibactin|13|167_1|Scoe 20005 0.8191710 1581 18424 12.653384 0.9209698 putative cytochrome P450 hydroxylase
coelibactin|13|168_1|Scoe 6767 0.7352332 1419 5348 4.768851 0.7903059 Thioesterase in siderophore biosynthesis gene cluster
coelibactin|13|169_1|Scoe 399 0.1569948 303 96 1.316832 0.2406015 FIG01124013: hypothetical protein
coelibactin|13|170_1|Scoe 22258 0.9782383 1888 20370 11.789195 0.9151766 Transport ATP-binding protein CydC
coelibactin|13|171_1|Scoe 21734 0.9797927 1891 19843 11.493390 0.9129935 Putative ABC iron siderophore transporter, fused permease and ATPase domains
coelibactin|13|172_1|Scoe 6038 0.9538860 1841 4197 3.279739 0.6950977 Anthranilate synthase, aminase component (EC 4.1.3.27)
coelichelin|5|63_1|Scoe 4954 0.7849741 1515 3439 3.269967 0.6941865 Polymyxin synthetase PmxB
coelichelin|5|64_1|Scoe 1009 0.3424870 661 348 1.526475 0.3448959 putative esterase
coelichelin|5|65_1|Scoe 22382 0.9751295 1882 20500 11.892667 0.9159146 ABC transporter transmembrane protein
coelichelin|5|66_1|Scoe 108707 0.8212435 1585 107122 68.584858 0.9854195 Siderophore biosynthesis non-ribosomal peptide synthetase modules
coelichelin|5|67_1|Scoe 22313 0.9777202 1887 20426 11.824589 0.9154305 FIG01120908: hypothetical protein
coelichelin|5|68_1|Scoe 6181 0.7787565 1503 4678 4.112442 0.7568355 iron-siderophore binding lipoprotein
coelichelin|5|69_1|Scoe 20071 0.9803109 1892 18179 10.608351 0.9057346 ABC-type Fe3+-siderophore transport system, ATPase component
coelichelin|5|70_1|Scoe 15867 0.9419689 1818 14049 8.727723 0.8854226 ABC-type Fe3+-siderophore transport system, permease 2 component
coelichelin|5|71_1|Scoe 15963 0.9430052 1820 14143 8.770879 0.8859863 ABC-type Fe3+-siderophore transport system, permease component
coelichelin|5|72_1|Scoe 3871 0.8020725 1548 2323 2.500646 0.6001033 Siderophore biosynthesis protein, monooxygenase
coelichelin|5|73_1|Scoe 4767 0.9746114 1881 2886 2.534290 0.6054122 formyltransferase
coelimycin|10|121_1|Scoe 16975 0.9155440 1767 15208 9.606678 0.8959057 Transcriptional regulator, TetR family
coelimycin|10|122_1|Scoe 2063 0.5803109 1120 943 1.841964 0.4571013 A-factor biosynthesis protein AfsA
coelimycin|10|123_1|Scoe 10992 0.8569948 1654 9338 6.645707 0.8495269 FIG01122353: hypothetical protein
coelimycin|10|124_1|Scoe 10657 0.8886010 1715 8942 6.213994 0.8390729 putative two-component system sensor kinase
coelimycin|10|125_1|Scoe 1837 0.7984456 1541 296 1.192083 0.1611323 2-oxoglutarate oxidoreductase, beta subunit (EC 1.2.7.3)
coelimycin|10|126_1|Scoe 1836 0.7948187 1534 302 1.196871 0.1644880 2-oxoglutarate oxidoreductase, alpha subunit (EC 1.2.7.3)
coelimycin|10|127_1|Scoe 8712 0.9611399 1855 6857 4.696496 0.7870753 Biotin carboxylase of acetyl-CoA carboxylase (EC 6.3.4.14) / Biotin carboxyl carrier protein of acetyl-CoA carboxylase
coelimycin|10|128_1|Scoe 4685 0.7601036 1467 3218 3.193592 0.6868730 FIG01129816: hypothetical protein
coelimycin|10|129_1|Scoe 40757 0.7953368 1535 39222 26.551792 0.9623378 Malonyl CoA-acyl carrier protein transacylase (EC 2.3.1.39)
coelimycin|10|130_1|Scoe 71289 0.8010363 1546 69743 46.111902 0.9783136 Malonyl CoA-acyl carrier protein transacylase (EC 2.3.1.39)
coelimycin|10|131_1|Scoe 100940 0.7994819 1543 99397 65.418017 0.9847137 Malonyl CoA-acyl carrier protein transacylase (EC 2.3.1.39)
coelimycin|10|132_1|Scoe 2442 0.6160622 1189 1253 2.053827 0.5131040 secreted protein
coelimycin|10|133_1|Scoe 17628 0.9227979 1781 15847 9.897810 0.8989676 Epoxide hydrolase (EC 3.3.2.9)
coelimycin|10|134_1|Scoe 20004 0.9031088 1743 18261 11.476764 0.9128674 Antiseptic resistance protein QacA
coelimycin|10|135_1|Scoe 17612 0.9544041 1842 15770 9.561346 0.8954122 Acetylornithine aminotransferase (EC 2.6.1.11)
coelimycin|10|136_1|Scoe 16933 0.8155440 1574 15359 10.757942 0.9070454 Cys-tRNA(Pro) deacylase YbaK
coelimycin|10|137_1|Scoe 4447 0.7533679 1454 2993 3.058459 0.6730380 secreted FAD-binding protein
coelimycin|10|138_1|Scoe 20000 0.9067358 1750 18250 11.428571 0.9125000 3-oxoacyl-[acyl-carrier protein] reductase (EC 1.1.1.100)
coelimycin|10|139_1|Scoe 8851 0.8455959 1632 7219 5.423407 0.8156141 FIG01131835: hypothetical protein
coelimycin|10|140_1|Scoe 9875 0.9487047 1831 8044 5.393228 0.8145823 Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2) / Acetyl-coenzyme A carboxyl transferase beta chain (EC 6.4.1.2); Propionyl-CoA carboxylase beta chain (EC 6.4.1.3)
coelimycin|10|141_1|Scoe 11 0.0056995 11 0 1.000000 0.0000000 hypothetical protein
coelimycin|10|142_1|Scoe 5128 0.6829016 1318 3810 3.890744 0.7429797 FIG01121841: hypothetical protein
coelimycin|10|143_1|Scoe 6893 0.7357513 1420 5473 4.854225 0.7939939 Thioesterase in siderophore biosynthesis gene cluster
coelimycin|10|144_1|Scoe 17031 0.8170984 1577 15454 10.799620 0.9074041 FIG01136508: hypothetical protein
desferrioxamine_B|3|17_1|Scoe 1647 0.6393782 1234 413 1.334684 0.2507590 Putative Desferrioxamine E transporter
desferrioxamine_B|3|18_1|Scoe 5113 0.8398964 1621 3492 3.154226 0.6829650 Hypothetical protein associated with desferrioxamine E biosynthesis
desferrioxamine_B|3|19_1|Scoe 4133 0.7880829 1521 2612 2.717291 0.6319865 Desferrioxamine E biosynthesis protein DesA @ Siderophore biosynthesis L-2,4-diaminobutyrate decarboxylase
desferrioxamine_B|3|20_1|Scoe 4083 0.8103627 1564 2519 2.610614 0.6169483 Desferrioxamine E biosynthesis protein DesB @ Siderophore biosynthesis protein, monooxygenase
desferrioxamine_B|3|21_1|Scoe 2979 0.6305699 1217 1762 2.447823 0.5914736 Desferrioxamine E biosynthesis protein DesC @ Siderophore synthetase small component, acetyltransferase
desferrioxamine_B|3|22_1|Scoe 2462 0.6580311 1270 1192 1.938583 0.4841592 Desferrioxamine E biosynthesis protein DesD @ Siderophore synthetase superfamily, group C @ Siderophore synthetase component, ligase
geosmin|9|120_1|Scoe 5575 0.6326425 1221 4354 4.565930 0.7809865 FIG01124023: hypothetical protein
hopene|12|149_1|Scoe 3685 0.8227979 1588 2097 2.320529 0.5690638 Phytoene synthase (EC 2.5.1.32)
hopene|12|150_1|Scoe 3754 0.8284974 1599 2155 2.347717 0.5740543 Phytoene synthase (EC 2.5.1.32)
hopene|12|151_1|Scoe 19 0.0098446 19 0 1.000000 0.0000000 hypothetical protein
hopene|12|152_1|Scoe 1326 0.5943005 1147 179 1.156059 0.1349925 Phytoene desaturase, pro-zeta-carotene producing (EC 1.-.-.-)
hopene|12|153_1|Scoe 7874 0.9818653 1895 5979 4.155145 0.7593345 Octaprenyl diphosphate synthase (EC 2.5.1.90); Dimethylallyltransferase (EC 2.5.1.1); (2E,6E)-farnesyl diphosphate synthase (EC 2.5.1.10); Geranylgeranyl pyrophosphate synthetase (EC 2.5.1.29)
hopene|12|154_1|Scoe 1553 0.6036269 1165 388 1.333047 0.2498390 Squalene–hopene cyclase (EC 5.4.99.17)
hopene|12|155_1|Scoe 1123 0.5673575 1095 28 1.025571 0.0249332 hypothetical protein Bcep3774, commonly clustered with carotenoid biosynthesis
hopene|12|156_1|Scoe 1417 0.6321244 1220 197 1.161475 0.1390261 Radical SAM protein required for addition of adenosine to hopane skeleton, HpnH
hopene|12|157_1|Scoe 2606 0.9549223 1843 763 1.413999 0.2927859 1-hydroxy-2-methyl-2-(E)-butenyl 4-diphosphate synthase (EC 1.17.7.1)
hopene|12|158_1|Scoe 8746 0.9766839 1885 6861 4.639788 0.7844729 1-deoxy-D-xylulose 5-phosphate synthase (EC 2.2.1.7)
hopene|12|159_1|Scoe 20012 0.9580311 1849 18163 10.823148 0.9076054 Aminotransferase HpnO, required for aminobacteriohopanetriol
hopene|12|160_1|Scoe 6824 0.8357513 1613 5211 4.230626 0.7636284 putative DNA-binding protein
hopene|12|161_1|Scoe 650 0.3290155 635 15 1.023622 0.0230769 hypothetical protein
melanin|2|15_1|Scoe 1379 0.4336788 837 542 1.647551 0.3930384 tyrosinase (monophenol monooxygenase)
melanin|2|16_1|Scoe 1180 0.3735751 721 459 1.636616 0.3889831 tyrosinase co-factor
methylenomycin|14|173_1|Scoe 7813 0.8538860 1648 6165 4.740898 0.7890695 NADH-FMN oxidoreductase
methylenomycin|14|174_1|Scoe 1056 0.4051813 782 274 1.350384 0.2594697 hypothetical protein
methylenomycin|14|175_1|Scoe 10511 0.8020725 1548 8963 6.790052 0.8527257 DNA-binding protein
methylenomycin|14|176_1|Scoe 4408 0.8056995 1555 2853 2.834727 0.6472323 Predicted dinucleotide-binding enzymes
methylenomycin|14|177_1|Scoe 8190 0.8844560 1707 6483 4.797891 0.7915751 2,4-dienoyl-CoA reductase [NADPH] (EC 1.3.1.34)
methylenomycin|14|178_1|Scoe 128 0.0616580 119 9 1.075630 0.0703125 AvrD protein
methylenomycin|14|179_1|Scoe 134 0.0507772 98 36 1.367347 0.2686567 putative ATP/GTP-binding protein, MmyX
methylenomycin|14|180_1|Scoe 6097 0.8777202 1694 4403 3.599174 0.7221584 3-oxoacyl-[acyl-carrier-protein] synthase, KASIII (EC 2.3.1.180)
methylenomycin|14|181_1|Scoe 477 0.2046632 395 82 1.207595 0.1719078 putative acyl carrier protein, MmyA
methylenomycin|14|182_1|Scoe 3944 0.9202073 1776 2168 2.220721 0.5496957 Phosphoserine phosphatase
methylenomycin|14|183_1|Scoe 135 0.0507772 98 37 1.377551 0.2740741 putative ATP/GTP-binding protein, MmyX
methylenomycin|14|184_1|Scoe 20001 0.9010363 1739 18262 11.501438 0.9130543 Permeases of the major facilitator superfamily
methylenomycin|14|185_1|Scoe 3166 0.7025907 1356 1810 2.334808 0.5716993 Transcriptional regulator, ArsR family
methylenomycin|14|186_1|Scoe 8669 0.9165803 1769 6900 4.900509 0.7959396 putative oxidoreductase
methylenomycin|14|187_1|Scoe 11630 0.8746114 1688 9942 6.889810 0.8548581 Limonene 1,2-monooxygenase
methylenomycin|14|188_1|Scoe 4156 0.6854922 1323 2833 3.141345 0.6816651 Thioesterase
methylenomycin|14|189_1|Scoe 20016 0.9466321 1827 18189 10.955665 0.9087230 Transcriptional regulator, TetR family
methylenomycin|14|190_1|Scoe 1965 0.5689119 1098 867 1.789618 0.4412214 A-factor biosynthesis protein AfsA
methylenomycin|14|191_1|Scoe 3608 0.6279793 1212 2396 2.976898 0.6640798 Pigment protein
methylenomycin|14|192_1|Scoe 3716 0.9165803 1769 1947 2.100622 0.5239505 Phosphoserine phosphatase
methylenomycin|14|193_1|Scoe 8099 0.8637306 1667 6432 4.858428 0.7941721 Transcriptional regulator MmyR, TetR family
sapB|11|145_1|Scoe 5358 0.7549223 1457 3901 3.677419 0.7280702 Lanthionine biosynthesis protein LanL
sapB|11|146_1|Scoe 361 0.1818653 351 10 1.028490 0.0277008 Lanthionine precursor peptide LanA
sapB|11|147_1|Scoe 22078 0.9792746 1890 20188 11.681482 0.9143944 FIG01133883: hypothetical protein
sapB|11|148_1|Scoe 21810 0.9782383 1888 19922 11.551907 0.9134342 FIG01121693: hypothetical protein
SCO-2138|1|1_1|Scoe 2435 0.6227979 1202 1233 2.025790 0.5063655 FIG01129357: hypothetical protein
SCO-2139|1|2_1|Scoe 2390 0.7010363 1353 1037 1.766445 0.4338912 Gluconolactonase (EC 3.1.1.17)
SCO-2140|1|3_1|Scoe 17246 0.9316062 1798 15448 9.591769 0.8957439 Transcriptional regulator, IclR family
SCO-2141|1|4_1|Scoe 915 0.4621762 892 23 1.025785 0.0251366 FIG01121703: hypothetical protein
SCO-2142|1|5_1|Scoe 19837 0.9445596 1823 18014 10.881514 0.9081010 Transcriptional regulator, GntR family
SCO-2143|1|6_1|Scoe 816 0.2559585 494 322 1.651822 0.3946078 Dicarboxylate carrier protein
SCO-2144|1|7_1|Scoe 23060 0.9461140 1826 21234 12.628697 0.9208153 putative fatty acid synthase
SCO-2145|1|8_1|Scoe 7619 0.9331606 1801 5818 4.230428 0.7636173 Acetyl-coenzyme A carboxyl transferase alpha chain (EC 6.4.1.2) / Acetyl-coenzyme A carboxyl transferase beta chain (EC 6.4.1.2)
SCO-2146|1|9_1|Scoe 16284 0.8663212 1672 14612 9.739234 0.8973225 putative secreted peptidase
SCO-2147|1|10_1|Scoe 782 0.2829016 546 236 1.432234 0.3017903 FIG01131749: hypothetical protein
SCO-2148|1|11_1|Scoe 577 0.2227979 430 147 1.341860 0.2547660 hypothetical protein
SCO-2149|1|12_1|Scoe 3052 0.7217617 1393 1659 2.190955 0.5435780 MoxR-like ATPases
SCO-2150|1|13_1|Scoe 20140 0.9455959 1825 18315 11.035616 0.9093843 FIG01122502: hypothetical protein
SCO-2151|1|14_1|Scoe 2854 0.6896373 1331 1523 2.144252 0.5336370 Rod shape-determining protein MreB
undecylprodigiosin|8|100_1|Scoe 20004 0.8865285 1711 18293 11.691408 0.9144671 Butyryl-CoA dehydrogenase (EC 1.3.99.2)
undecylprodigiosin|8|101_1|Scoe 65 0.0336788 65 0 1.000000 0.0000000 RedY protein
undecylprodigiosin|8|102_1|Scoe 20000 0.8974093 1732 18268 11.547344 0.9134000 two-component system response regulator
undecylprodigiosin|8|103_1|Scoe 67 0.0341969 66 1 1.015151 0.0149254 RedV protein
undecylprodigiosin|8|104_1|Scoe 120 0.0611399 118 2 1.016949 0.0166667 FIG01126548: hypothetical protein
undecylprodigiosin|8|105_1|Scoe 50 0.0259067 50 0 1.000000 0.0000000 hypothetical protein
undecylprodigiosin|8|106_1|Scoe 26 0.0134715 26 0 1.000000 0.0000000 hypothetical protein
undecylprodigiosin|8|107_1|Scoe 25705 0.9145078 1765 23940 14.563739 0.9313363 3-oxoacyl-[acyl-carrier-protein] synthase, KASII (EC 2.3.1.41)
undecylprodigiosin|8|108_1|Scoe 912 0.3362694 649 263 1.405239 0.2883772 Acyl carrier protein
undecylprodigiosin|8|109_1|Scoe 6755 0.8777202 1694 5061 3.987603 0.7492228 3-oxoacyl-[acyl-carrier-protein] synthase, KASIII (EC 2.3.1.41)
undecylprodigiosin|8|110_1|Scoe 200 0.0979275 189 11 1.058201 0.0550000 Acyl carrier protein
undecylprodigiosin|8|111_1|Scoe 4155 0.8362694 1614 2541 2.574349 0.6115523 Aminotransferase class II, serine palmitoyltransferase like (EC 2.3.1.50)
undecylprodigiosin|8|112_1|Scoe 35354 0.8388601 1619 33735 21.836936 0.9542060 non-ribosomal peptide synthetase
undecylprodigiosin|8|113_1|Scoe 29318 0.8492228 1639 27679 17.887736 0.9440958 Capsular polysaccharide biosynthesis fatty acid synthase WcbR
undecylprodigiosin|8|114_1|Scoe 12082 0.9331606 1801 10281 6.708495 0.8509353 probable oxidoreductase
undecylprodigiosin|8|115_1|Scoe 6650 0.7378238 1424 5226 4.669944 0.7858647 putative thioesterase
undecylprodigiosin|8|116_1|Scoe 2639 0.7170984 1384 1255 1.906792 0.4755589 putative methyltransferase
undecylprodigiosin|8|117_1|Scoe 3142 0.6393782 1234 1908 2.546191 0.6072565 Pyruvate-utilizing enzyme, similar to phosphoenolpyruvate synthase
undecylprodigiosin|8|118_1|Scoe 3303 0.7129534 1376 1927 2.400436 0.5834090 Rieske (2Fe-2S) domain protein
undecylprodigiosin|8|119_1|Scoe 673 0.3212435 620 53 1.085484 0.0787519 FIG01125690: hypothetical protein
undecylprodigiosin|8|98_1|Scoe 17096 0.8160622 1575 15521 10.854603 0.9078732 FIG01131857: hypothetical protein
undecylprodigiosin|8|99_1|Scoe 47731 0.8601036 1660 46071 28.753615 0.9652218 FIG01134662: hypothetical protein

PResence Absence EvoMining was run over enzymes with expansion number between .1 and .6

Figure 4 Pan cluster Idea on closed Streptomyces

Open /closed coelicolor How spread is the cluster

Took 15 clusters from Streptomyces coelicolor on MiBig Analize its open/close pancluster according to EvoMining backwards
O sea 15 corasones, no necesito escoger las query enzyme, al menos 3 por cluster… y que no sean NRPS o PKS

MEthodology

[@dufresne_algorithmique_2016,@blin_recent_nodate,@kurtboke_revisiting_2017,@miller_interpreting_2017,@schniete_expanding_2017,@kim_recent_2017,@robertsen_toward_2017,@juarez-vazquez_evolution_nodate,@chavali_bioinformatics_nodate,@tracanna_mining_2017,@ren_breaking_2017,@choudhary_current_2017,@alanjary_antibiotic_2017,@chevrette_sandpuma:_2017,@wohlleben_antibiotic_2016,@weber_secondary_2016]

References

Alanjary, Mohammad, Brent Kronmiller, Martina Adamek, Kai Blin, Tilmann Weber, Daniel Huson, Benjamin Philmus, and Nadine Ziemert. 2017. “The Antibiotic Resistant Target Seeker (ARTS), an Exploration Engine for Antibiotic Cluster Prioritization and Novel Drug Target Discovery.” Nucleic Acids Research 45 (W1): W42–W48. doi:10.1093/nar/gkx360.

Barona-Gómez, Francisco, Pablo Cruz-Morales, and Lianet Noda-García. 2012. “What Can Genome-Scale Metabolic Network Reconstructions Do for Prokaryotic Systematics?” Antonie van Leeuwenhoek 101 (1): 35–43. doi:10.1007/s10482-011-9655-1.

Blin, Kai, Hyun Uk Kim, Marnix H. Medema, and Tilmann Weber. 2017. “Recent Development of antiSMASH and Other Computational Approaches to Mine Secondary Metabolite Biosynthetic Gene Clusters.” Briefings in Bioinformatics. Accessed January 16. doi:10.1093/bib/bbx146.

Chavali, Arvind K., and Seung Y. Rhee. 2018. “Bioinformatics Tools for the Identification of Gene Clusters That Biosynthesize Specialized Metabolites.” Briefings in Bioinformatics. Accessed January 16. doi:10.1093/bib/bbx020.

Chevrette, Marc G., Fabian Aicheler, Oliver Kohlbacher, Cameron R. Currie, and Marnix H. Medema. 2017. “SANDPUMA: Ensemble Predictions of Nonribosomal Peptide Chemistry Reveal Biosynthetic Diversity Across Actinobacteria.” Bioinformatics 33 (20): 3202–10. doi:10.1093/bioinformatics/btx400.

Choudhary, Alka, Lynn M. Naughton, Itxaso Montánchez, Alan D. W. Dobson, and Dilip K. Rai. 2017. “Current Status and Future Prospects of Marine Natural Products (MNPs) as Antimicrobials.” Marine Drugs 15 (9): 272. doi:10.3390/md15090272.

Cibrián-Jaramillo, Angélica, and Francisco Barona-Gómez. 2016. “Increasing Metagenomic Resolution of Microbiome Interactions Through Functional Phylogenomics and Bacterial Sub-Communities.” Frontiers in Genetics 7. doi:10.3389/fgene.2016.00004.

Cruz-Morales, Pablo, Johannes Florian Kopp, Christian Martínez-Guerrero, Luis Alfonso Yáñez-Guerra, Nelly Selem-Mojica, Hilda Ramos-Aboites, Jörg Feldmann, and Francisco Barona-Gómez. 2016. “Phylogenomic Analysis of Natural Products Biosynthetic Gene Clusters Allows Discovery of Arseno-Organic Metabolites in Model Streptomycetes.” Genome Biology and Evolution 8 (6): 1906–16. doi:10.1093/gbe/evw125.

Dufresne, Yoann. 2016. “Algorithmique Pour L’annotation Automatique de Peptides Non Ribosomiques.” PhD thesis, Lille1. https://tel.archives-ouvertes.fr/tel-01563992/document.

Juárez-Vázquez, Ana Lilia, Janaka N Edirisinghe, Ernesto A Verduzco-Castro, Karolina Michalska, Chenggang Wu, Lianet Noda-García, Gyorgy Babnigg, et al. 2017. “Evolution of Substrate Specificity in a Retained Enzyme Driven by Gene Loss.” ELife 6. Accessed January 16. doi:10.7554/eLife.22679.

Kim, Hyun Uk, Kai Blin, Sang Yup Lee, and Tilmann Weber. 2017. “Recent Development of Computational Resources for New Antibiotics Discovery.” Current Opinion in Microbiology 39 (October): 113–20. doi:10.1016/j.mib.2017.10.027.

Kurtböke, İpek. 2017. “Revisiting Biodiscovery from Microbial Sources in the Light of Molecular Advances.” Microbiology Australia 38 (2): 58–61. doi:10.1071/MA17028.

Medema, Marnix H., and Michael A. Fischbach. 2015. “Computational Approaches to Natural Product Discovery.” Nature Chemical Biology 11 (9): 639–48. doi:10.1038/nchembio.1884.

Medema, Marnix H., Renzo Kottmann, Pelin Yilmaz, Matthew Cummings, John B. Biggins, Kai Blin, Irene de Bruijn, et al. 2015. “Minimum Information About a Biosynthetic Gene Cluster.” Nature Chemical Biology 11 (9): 625–31. doi:10.1038/nchembio.1890.

Miller, Ian J., Marc G. Chevrette, and Jason C. Kwan. 2017. “Interpreting Microbial Biosynthesis in the Genomic Age: Biological and Practical Considerations.” Marine Drugs 15 (6): 165. doi:10.3390/md15060165.

Ren, Hengqian, Bin Wang, and Huimin Zhao. 2017. “Breaking the Silence: New Strategies for Discovering Novel Natural Products.” Current Opinion in Biotechnology, Chemical biotechnology • Pharmaceutical biotechnology, 48 (December): 21–27. doi:10.1016/j.copbio.2017.02.008.

Robertsen, Helene Lunde, Tilmann Weber, Hyun Uk Kim, and Sang Yup Lee. 2017. “Toward Systems Metabolic Engineering of Streptomycetes for Secondary Metabolites Production.” Biotechnology Journal 13 (1): n/a–n/a. doi:10.1002/biot.201700465.

Schniete, Jana K., Pablo Cruz-Morales, Nelly Selem, Lorena T. Fernandez-Martinez, Iain S. Hunter, Francisco Barona-Gomez, and Paul Hoskisson. 2017. “Expanding Gene Families Helps Generate The Metabolic Robustness Required For Antibiotic Biosynthesis.” BioRxiv, March, 119354. doi:10.1101/119354.

Tracanna, Vittorio, Anne de Jong, Marnix H. Medema, and Oscar P. Kuipers. 2017. “Mining Prokaryotes for Antimicrobial Compounds: From Diversity to Function.” FEMS Microbiology Reviews 41 (3): 417–29. doi:10.1093/femsre/fux014.

Weber, Tilmann, and Hyun Uk Kim. 2016. “The Secondary Metabolite Bioinformatics Portal: Computational Tools to Facilitate Synthetic Biology of Secondary Metabolite Production.” Synthetic and Systems Biotechnology, Special Issue on “Bioinformatic tools and approaches for Synthetic Biology of natural products”, 1 (2): 69–79. doi:10.1016/j.synbio.2015.12.002.

Wohlleben, Wolfgang, Yvonne Mast, Evi Stegmann, and Nadine Ziemert. 2016. “Antibiotic Drug Discovery.” Microbial Biotechnology 9 (5): 541–48. doi:10.1111/1751-7915.12388.

Ziemert, Nadine, Mohammad Alanjary, and Tilmann Weber. 2016. “The Evolution of Genome Mining in Microbes – a Review.” Natural Product Reports 33 (8): 988–1005. doi:10.1039/C6NP00025H.